Normalized Information Distance

نویسندگان

  • Paul M. B. Vitányi
  • Frank J. Balbach
  • Rudi Cilibrasi
  • Ming Li
چکیده

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian sign language detection based on normalized depth image information

There are many reports of using the Kinect to detect hand and finger gestures after release of device by Microsoft. The depth information is mostly used to separate the hand image in the two-dimension of RGB domain. This paper proposes a method in which the depth information plays a more dominant role. Using a threshold in depth space first the hand template is extracted. Then in 3D domain the ...

متن کامل

Normalized Information Distance and the Oscillation Hierarchy

We study the complexity of approximations to the normalized information distance. We introduce a hierarchy of computable approximations by considering the number of oscillations. This is a function version of the difference hierarchy for sets. We show that the normalized information distance is not in any level of this hierarchy, strengthening previous nonapproximability results. As an ingredie...

متن کامل

Normalized information-based divergences

This paper is devoted to the mathematical study of some divergences based on the mutual information well-suited to categorical random vectors. These divergences are generalizations of the " entropy distance " and " information distance ". Their main characteristic is that they combine a complexity term and the mutual information. We then introduce the notion of (normalized) information-based di...

متن کامل

Chapter 3 Normalized Information Distance

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wid...

متن کامل

3D Scene and Object Classification Based on Information Complexity of Depth Data

In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0809.2553  شماره 

صفحات  -

تاریخ انتشار 2008